feat(agents-runtime): Sandbox primitive + Docker/E2B providers + sandbox profile picker#4369
feat(agents-runtime): Sandbox primitive + Docker/E2B providers + sandbox profile picker#4369msfstef wants to merge 7 commits into
Conversation
c6a9ffc to
91303cc
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #4369 +/- ##
==========================================
+ Coverage 56.08% 60.51% +4.43%
==========================================
Files 263 327 +64
Lines 28601 34774 +6173
Branches 8003 9593 +1590
==========================================
+ Hits 16040 21044 +5004
- Misses 12546 13712 +1166
- Partials 15 18 +3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
✅ Deploy Preview for electric-next ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
b4082a4 to
91b0613
Compare
3b91f17 to
53c9999
Compare
Claude Code ReviewSummary: Iteration 6 adds one CI-fix commit (0def597) - a lockfile sync that removes the undici entry stranded in pnpm-lock.yaml when f3e44a5 dropped the dep from agents-runtime/package.json. No source changes; the prior architectural verdict carries forward. What is working well:
Issues Found - Suggestions (Nice to Have): Both suggestions from iterations 3-5 are unchanged in iteration 6. No new logic was touched, so neither was addressed. Restating only as a carry-forward; the reasoning is unchanged. applyInheritedSandbox silently overrides sibling fields when inherit: true (packages/agents-server/src/routing/sandbox.ts:78-93) sandboxChoiceSchema makes every field optional, so the wire permits inherit: true alongside profile/key/scope/persistent/owner. applyInheritedSandbox short-circuits on requested.inherit and discards the siblings, so e.g. inherit: true, persistent: false silently keeps the parent durability. A one-line comment on applyInheritedSandbox (or an XOR refinement on the schema) is the lower-cost fix. Implicit POSIX-absolute invariant on workingDirectory (packages/agents-runtime/src/sandbox/path-containment.ts:17-22) absoluteSandboxPath falls back to posix.resolve(workingDirectory, path) for relative inputs. If a caller ever constructs DockerSandbox/RemoteSandbox with a relative or non-POSIX workingDirectory, the result joins against the runtime POSIX cwd - wrong for a path that is supposed to name a container/VM location. A guard like if (!posix.isAbsolute(workingDirectory)) throw in each provider constructor would turn a silent foot-gun into a loud one. Issue Conformance: No linked issue. The DNS-to-private-IP SSRF gap explicitly documented at net-policy.ts:114-118 remains untracked - still worth filing so the gap is not lost as the providers evolve. Previous Review Status: Iteration 5 carried two suggestions forward; both still apply. Iteration 6 introduces one new commit, and it is purely a CI-unblock - a stale-lockfile fix for f3e44a5. No functional changes, no new surface, no regressions. The prior architectural verdict stands. Review iteration: 6 | 2026-05-27 |
016126f to
a271896
Compare
Electric Agents Mobile BuildAndroid preview build for commit
|
a271896 to
8cd4dff
Compare
…viders
Adds the `Sandbox` primitive (`@electric-ax/agents-runtime/sandbox`) that
isolates the filesystem, process, and network operations performed by
LLM-driven tools, and routes the bash/read/write/edit/fetch_url tools through
it.
Providers:
- `unrestrictedSandbox` — in-process host pass-through, the default for built-in
entities via `chooseDefaultSandbox`. Single-tenant trusted-code default: the
tool layer contains reads/writes to the workspace and rejects symlink escapes,
but it is NOT a containment boundary (host FS/PID namespace shared).
- `dockerSandbox` — container isolation via `dockerode` (optional peer dep).
Hardened HostConfig (CapDrop ALL, no-new-privileges, pids/mem/cpu limits, no
docker socket). deny-all ⇒ NetworkMode=none (hard boundary); any other policy
gets a bridge, where the allowlist is fetch-tool-only surface protection, not
an exec/bash egress boundary.
- `remoteSandbox({provider:'e2b'})` — off-host VM via E2B (optional peer dep),
with reattach / persistence / desktop support.
Lifecycle (resolveSandboxIdentity): identity from an explicit cross-entity
`key` or a `scope` shorthand ('entity' default ⇒ entityUrl, 'wake' ⇒
entityUrl#wakeId); `persistent` selects idle teardown (stop/preserve vs
remove/wipe); `owner` gates create-vs-attach so an `inherit` subagent only
attaches to an owner's live sandbox and never conjures a fresh one. Per-key
locked, refcounted, debounced teardown with deterministic naming so a
cold-started host reattaches by key.
Profiles: runtimes advertise named profiles (e.g. `local`, `docker`); the
agents-server validates a spawn's chosen profile against the target runner's
advertised set and enforces co-location for shared local sandboxes; the
new-session UI surfaces a picker.
Hardening / behavior:
- bash drops host `process.env` (removes the trivial secret-dump leak).
- read/write/edit reject symlink escapes from the workspace.
- docker exec polls `inspect()` until reaped (no transient null exit codes).
- boot orphan sweep reclaims only exited *ephemeral* leftovers — never a running
(possible live peer) or persistent (reattachable) container.
- fetch SSRF guard canonicalizes encoded IP literals (decimal/hex integer,
::ffff-mapped, bracketed IPv6).
`createFetchUrlTool` and the other tool factories now require a `Sandbox`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…by-runner - Reorder new-session composer pickers to Model → Effort → Runner → Sandbox → Working Directory (working dir last; still hidden for remote profiles, docker keeps local-like behavior). - Add clickable runner + sandbox badges to the entity header (detail popovers) and enrich the sidebar hover info with runner/sandbox rows. - Surface the *effective* sandbox: when an entity has no explicit profile the runtime falls back to the host `local` sandbox (process-wake.ts) and never persists it, so the UI now resolves that default — every entity shows Local / Docker / E2B, not just ones spawned with an explicit pick. - Expose the entity's pinned runner from `dispatch_policy` (already allow-listed server-side) on the UI entity schema/collection + optimistic spawn insert; resolve runner/sandbox labels from the runners collection. - Add "Group by → Runner" and a "Show → Runner" filter to the sidebar. - Tests for groupByRunner and the runner/effective-sandbox resolvers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The entities Electric shape proxy allowlist omitted `sandbox`, so the profile an entity was spawned with (Local / Docker / E2B) never reached the UI — `entity.sandbox` was always undefined client-side even though the column is populated at spawn (entity-registry.createEntity). This made the header/sidebar sandbox badge (and the timeline's sandbox pill) always fall back to the "Local" default regardless of the real profile. Add `sandbox` to the entities column allowlist and a regression test asserting both `sandbox` and `dispatch_policy` are exposed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- SSRF guard: parse all inet_aton IPv4 forms (shorthand/octal/hex) so
127.1, 0177.0.0.1, etc. can't bypass the private-IP denylist
- UnrestrictedSandbox: enforce post-dispose use via assertLive(), keeping
the cross-provider conformance invariant honest
- docker makeBinds: realpath-resolve mount hostPaths before the
docker-socket check so a symlink can't smuggle the socket in
- process-wake: clarify the SandboxError('unavailable') message — a dropped
profile fails only that wake (caught per-wake) and is redriven by the
server; the runner stays up
- e2b: thread an optional logger so the keep-alive heartbeat failure leaves
a debug trail instead of silently swallowing every error
- unrestricted resolveWithin: TODO documenting the multi-tenant TOCTOU window
- docker profile: fix the misleading "network constrained" description
(default is allow-all) and note network policy can become a per-spawn arg
- fetch_url: surface SandboxError('policy') as a distinct "blocked by
network policy" message, mirroring the FS tools
- bash tool: note the host env is not forwarded
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reduce novel-pattern surface in the sandbox feature by mirroring the established dispatch_policy structure, and share two duplicated provider helpers. No behavior change. - Move sandbox spawn resolution off EntityManager into routing/sandbox.ts (sibling of routing/dispatch-policy.ts), split into the orchestrator plus applyInheritedSandbox / resolveChosenProfileRemote / assertSharedSandboxColocated. - Extract the sandbox choice wire schema to sandbox-choice-schema.ts and a single SandboxChoice type, collapsing three hand-written copies (router schema, TypedSpawnRequest.sandbox, resolver param). - Share path containment (absoluteSandboxPath / isPathWithinSandbox) between the docker and remote providers; unrestricted keeps its stricter realpath walk. - Share the dispose wipe-vs-preserve core (sandboxWipesOnDispose); each provider keeps its own owner-gating, which genuinely differs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
68b3e6d to
1c9c117
Compare
Follow-up cleanups from a cross-package alignment review of the sandbox
work, keeping it consistent with the dispatch_policy precedent:
- runtime: rename docker provider `reuseKey` -> `sandboxKey` so all three
sandbox providers share one contract field
- server: drop dead `listSandboxProfileNames`; collapse `SandboxProfileInput`
into `SandboxProfileAdvertisement`
- runtime: remove unused `undici` dep; tighten `dockerode` peer floor to >=5
- ui: use `shortenId` instead of inlined truncation; drop an `as never` cast;
resolve the EntityTimeline sandbox badge to its advertised label
- runtime: document the display-only `{ profile }` membership-row narrowing
- changeset: reword the tool-factory note to match the patch bump
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Commit f3e44a5 dropped the unused `undici` dependency from agents-runtime/package.json without regenerating pnpm-lock.yaml, breaking every CI job at `pnpm install --frozen-lockfile`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Adds the
Sandboxprimitive to the agents runtime — a pluggable abstraction that isolates the filesystem, process, and network operations performed by LLM-driven tool calls — and wires it end-to-end through the runtime, agents-server, desktop, and new-session UI.The primitive
@electric-ax/agents-runtime/sandboxexposes a deliberately smallSandboxinterface:exec, FS methods (readFile/writeFile/mkdir/readdir/exists/remove/stat),fetch(egresses through the sandbox's own network), anddispose.SandboxErrorcarries apolicy | runtime | unavailablekind. Containment is documented per concern rather than promised uniformly — writes contained on every provider; reads contained on unrestricted/docker; in-workspace symlink escapes rejected on unrestricted.nameis a free-form provider id for logs, not a capability discriminator.Providers
unrestrictedSandbox— in-process pass-through overnode:fs/child_process; the built-in default. A single-tenant, trusted-code default: the tool layer contains reads/writes to the workspace and rejects symlink escapes, but it shares the host FS/PID namespace and is not a containment boundary.dockerSandbox— hardened container isolation viadockerode(optional peer dep): CapDrop ALL, no-new-privileges, no docker socket, pids/mem/cpu limits.deny-all⇒NetworkMode=none(the hard network boundary); any other policy gets a bridge, where the allowlist + SSRF guard are fetch-tool-only surface protection, not anexec/bash egress boundary. Exported under/sandbox/dockerso callers needing onlyunrestricteddon't pulldockerode.remoteSandbox({provider: 'e2b'})— first-class adapter for E2B's npm SDK (optional peer dep): reattaches to a shared workspace by key and defers lifecycle to the platform (see below). TheRemoteSandboxClientinterface makes adding Vercel/Daytona/etc. mechanical.Lifecycle & identity
resolveSandboxIdentityderives three orthogonal facts from an entity's sandbox config + the live wake:key, or ascopeshorthand ('entity'default ⇒entityUrl;'wake'⇒entityUrl#wakeIdfor full per-wake isolation). "Full isolation" is just a unique key, never a separate code path.persistent) — selects the owner's idle teardown: preserve (stop/suspend, reattachable) vs wipe (remove/kill).owner) — an owner creates and governs teardown; a non-owner (aninheritspawn) only attaches to an already-live sandbox and never conjures a fresh one.Per-key locked, refcounted, debounced teardown with deterministic container/workspace naming so a cold-started host reattaches by key.
processWakeconstructs the sandbox once per wake-session and disposes it in the outerfinally(handlers must not calldispose()).Sandbox profiles (advertise / validate / pick)
SandboxProfiles (name,label,description?,remote?, localfactory). Built-ins:local(always),docker(only when the Docker daemon is reachable), ande2b(only whenE2B_API_KEYis set and the optionale2bdep is installed) — so the UI never offers a non-functional choice.0010_sandbox_profilesaddsrunners.sandbox_profilesandentities.sandbox.sandboxselection; the server validates the chosen profile against the target runner's advertised set (or, for unpinned dispatch, a tenant-wide check) and rejects unserviceable choices up front.E2B remote provider — shareable, persistent, desktop-ready
remoteSandbox({provider: 'e2b'})is a first-class shareable provider, mirroring how the Docker provider handles shared/persistent containers:sandboxKey(e2b metadata); a wake looks it up vialist+connect(which auto-resumes a paused VM) and only creates one when none is alive — so collaborators and later wakes, even on a freshly cold-started host, converge on the same workspace. A cross-host create race resolves deterministically (oldest wins). Private (per-entity) sandboxes are created fresh per wake.lifecycle: { onTimeout: 'pause' }and kept alive by asetTimeoutheartbeat while a wake holds them (e2b's timeout is absolute, not idle, so activity doesn't refresh it).dispose()just stops the heartbeat — the platform auto-suspends the VM on idle (filesystem + memory preserved, reattachable for e2b's paused-retention window) with no explicit teardown and no cross-host refcount. Private sandboxes stillkill()on dispose.remoteflag flows runtime → runner advertisement → server. A shared local sandbox still requires its collaborators to be pinned to a single runner (the container lives on one host); a shared remote sandbox is reachable from any runner, so the single-runner guard is skipped for it.e2bprofile gated on anE2B_API_KEYcredential (stored alongside the Anthropic/OpenAI/Brave keys and mirrored into the runtime env on save), and externalizes the optionale2bdep from the Electron main bundle. The new-session picker keeps an explicit profile choice across runner re-advertisement and hides the working-directory control for remote profiles. Horton reports and readsAGENTS.mdfrom the sandbox's own working directory (/workin the VM) rather than a host path, so the model never sees paths its tools can't reach.Tool refactor + hardening (folded in)
createFetchUrlTool, read/write/edit, bash) now require aSandboxparameter and route through it.process.envto children — removes the trivialenv-dump leak of secrets like$ANTHROPIC_API_KEY. (The host-sharingunrestrictedprovider still can't fully contain secrets, e.g. via/proc/<ppid>/environ; usedocker/remotefor untrusted or multi-tenant entities.)unrestricted.resolveWithin, the CVE-2025-53109/53110-shape defense) and surfaced through the tools.execpollsinspect()until the exec is reaped, so a cleanly-exited command never returns a transientnullexit code.sweepOrphanedDockerSandboxes) reclaims only exited ephemeral leftovers — it never force-removes a running container (possible live sibling on a shared daemon) or a persistent one (meant to be reattached by key).fetchSSRF guard canonicalizes encoded IP literals (decimal/hex integer,::ffff:-mapped, bracketed IPv6) so they can't bypass the private/link-local/metadata check.Built-in entities (Horton, Worker) default to
unrestrictedSandboxviachooseDefaultSandbox(workingDirectory). Stronger isolation is opt-in by selecting thedocker/e2bprofile or constructingdockerSandbox/remoteSandboxdirectly.What this primitive is and is not
Targets host isolation for LLM-driven tool calls (cwd escape, env-var exfil, arbitrary network egress, symlink traversal). It does not address prompt-injection-driven misuse of otherwise-legitimate tools.
Provider-specific limitations (documented in the interface, not promised uniformly):
unrestrictedSandboxis not a containment boundary — it shares the host FS/PID namespace. Tool-layer policy (workspace + symlink containment, host-env scrubbing) shrinks the blast radius but does not stop host-level reads (e.g./proc) or SSRF fromfetch_url.deny-all(NetworkMode=none) andallow-all;allowlistis enforced host-side at thefetchtool only, and code run viaexec/bash has direct bridge egress.sandbox.fetch()onremoteSandboxruns an HTTP client inside the VM viaexec, so egress is governed by the VM's network controls.Test plan
Cross-provider conformance suite pins the
Sandboxcontract across unrestricted / remote (in-memory fake of theRemoteSandboxClientSDK contract) / docker (gated on daemon availability).Per-provider suites: docker lifecycle + keyed reattach + scoped orphan sweep (running/persistent preserved, exited-ephemeral reclaimed); unrestricted containment + tool-layer symlink safety; net-policy SSRF incl. encoded-literal canonicalization; profiles; tool-refactor.
E2B (mock-based, no live account in CI): reattach-by-key, keep-alive heartbeat, suspend-vs-kill dispose —
sandbox-remote.test.ts.Server-side spawn validation (
electric-agents-sandbox-spawn.test.ts) incl. a shared remote profile bypassing the single-runner guard while a shared local one still requires pinning;runners-router.test.tsround-trips theremoteflag.Verified locally:
agents-runtime(724 tests) +agents-server+agentssuites green; all packages typecheck clean; docker integration suite (incl. new exec-reap + scoped-sweep tests) green against a live daemon.CI matrix exercises the Docker path on Linux
Manual smoke test of
remoteSandbox({provider: 'e2b'})against a real E2B account (verified on desktop: shared sandbox resolves toremote:e2bat/work, reattaches across wakes)🤖 Generated with Claude Code